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ABSTRACT 

This  paper  extends  previous  results  by  the  first  author  (see  for  example, 
"Mathematical  Models  for  Statistical  Decision  Theory"  in  Optimizing  Methods 
in  Statistics,  J.  S.  Rustagi,  Ed.,  Academic  Press,  1971). 

The  approximation  theory  model  describes  a class  of  optimality  principles 
in  statistical  decision  theory  as  follows.  Let  S be  the  risk  set  of  a 
statistical  decision  problem,  that  is,  S = {R  (0) , 0 e 0,  ip  e 4>}  where  ♦ is 
the  collection  of  randomized  decision  procedures,  0 is  the  parameter  space 
and  R^(0)  is  the  risk  function  of  the  statistical  decision  procedure  <fi . We 
interpret  S as  a set  in  the  normed  linear  space  L.  Let  v = v(0)  satisfy 
v(0)  <_  R^(0)  for  all  e 4>  and  all  0 c 0.  Then  sQ  e S is  said  to  be 

(v,£)  optimal  if  || sQ  - v||  <_  ||s  - v||  for  all  s e S. 

It  is  easily  seen  that  many  well-known  optimality  principles  of  statistics 
are  of  this  type,  such  as  Bayes  rules  and  minimax  rules. 

In  this  paper,  characterization  theorems  for  this  class  of  optimality 
principles  are  given. 
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This  paper  characterizes  optimality  principles  in  decision 


decision  procedures.  It  is  shown  that  many  results  in  approxima- 
tion theory  have  interpretations  as  results  in  decision  theory, 


and  conversely  many  results  in  statistical  decision  theory  can 
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Bernard  Harris*  and  Gerhard  Heindl** 

1.  Introduction  and  Summary.  The  purpose  of  this  paper  is  to  further  explore  the 
relationship  between  statistical  decision  theory  and  approximation  theory  initially 
presented  in  B.  Harris  [5].  As  indicated  there,  many  results  of  approximation  theory, 
when  suitably  modified,  have  interpretations  as  results  in  statistical  decision  theory. 
Conversely,  many  results  of  statistical  decision  theory  are  also  capable  of  reformula- 
tion as  results  in  approximation  theory. 

In  this  paper,  some  results  of  app:  .ximation  theory  will  be  modified,  reformulated 

and  reinterpreted  as  results  of  statistical  decision  theory. 

Let  0 # $ be  a given  set  with  elements  0.  Further  let  ♦ be  a non-empty  family 

of  probability  measures  defined  on  Bp,  a o-algebra  of  subsets  of  a set  D.  Thus, 

we  are  given  the  family  of  measure  spaces  t Q . For  each  *>  * t,  let  R 

D * 

be  a mapping  from  0 into  the  extended  real  numbers.  Let  R * {R  , <t  * ♦ },  that  is, 

•f 

R is  a family  of  extended  real  valued  functions  of  9,0*0.  It  will  be  assumed  that 
R^(0)  is  uniformly  bounded  from  below,  in  other  words,  there  exists  a real  number  M 
such  that 

(1)  R„(6)  > M>  for  all  * ♦ and  all  0*0. 

In  addition  we  also  require  that  R be  a convex  set,  that  is,  for  every  pair  of  func- 
tions Rj,R^  ( R and  every  \ , 0 <_  \ < 1 , 

R^  (1,2)  = Xr  + (1  - X)R2  e R . 

In  the  usual  terminology  employed  in  statistics,  0 is  the  parameter  space  and  0 
is  called  a parameter.  R is  the  risk  set  of  a statistical  decision  problem  and  R^ 
is  called  a risk  function.  The  reader  is  referred  to  standard  treatises  on  statistical 
decision  theory,  such  as  D.  Blackwell  and  M.  A.'  Girshick  [1]  or  T.  S.  Ferguson  [4]  for 
a detailed  description  of  the  manner  in  which  the  risk  set  is  determined  by  a statistical 
decision  problem. 
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Within  the  framework  described  above  a statistical  decision  problem  may  be 

described  as  follows.  The  statistician  selects  a fit*  and  the  performance  of  the 

fi  that  he  selects  is  described  by  the  risk  function  R^.  Consequently  optimality  in 

statistics  may  be  roughly  described  by  the  intuitive  statement;  “fi  is  optimal  if 

R^(0)  is  as  small  as  possible".  However,  in  general,  there  is  no  fiQ  t t such  that 

R (6)  < R (0)  for  all  0 and  all  •fit*.  Consequently,  there  is  no  single  optimality 
*0  ~ * 

principle  which  enjoys  universal  acceptance.  In  a previous  article  on  optimality 
principles  in  statistics,  Harris  IS]  proposed  a class  of  optimality  principles  called 
(v,£)  optimality,  which  we  will  now  define. 

Let  L be  a normed  linear  space  with  a partial  ordering.  Assume  that  for 
0 < x < y,  x,y  t l,  we  have  ||x||  < ||y||.  We  refer  to  this  property  by  saying  that 
the  space  L is  endowed  with  a monotonic  norm.  Let  S ^ « R 1-1  L,  that  is,  the  restric- 
tion of  the  risk  set  to  those  elements  with  finite  norms.  To  avoid  trivial  situations, 
we  will  assume  that  Let  v be  a distinguished  point  in  L satisfying 


v < R,  i.e.,  v = v(0)  < R (6)  for  all  fi  e 
— — <fi 


The  partial  ordering  in  L will 


be  defined  as  the  partial  ordering  on  the  risk  functions  ( 0 ) induced  by  admissibility. 

Thus  we  identify  the  points  s in  Sr  with  the  risk  functions,  so  that  s £ s 

L 12 


and  s^  is 


means  R (0)  < R (6)  for  all  6,  where  s,  is  identified  with  R 
~ *2  1 
identified  with  R . Similarly,  s,  < s,  means  R (0)  < R (0)  for  all  0 and 

fi  2 12  V>i  — fi  2 

there  exists  a 0Q  e 9 such  that  R^  (0Q)  < R^  (0Q)  . Thus,  v <_  R insures  v <_  . 

Then,  with  this  identification,  we  say  that  sQ  (equivalently  is  (v,£) 

optimal  if  ||  sQ  - v||  <_  ||s  - v||  for  all  sc  S^. 

It  is  easily  seen  that  appropriate  specifications  of  v and  L give  rise  to 
various  standard  optimality  principles  of  statistical  theory,  such  as  Bayes  rules, 
minimax  rules  and  minimax  regret  rules.  Specifically,  let  v « v(0)  = M and  let  t 
be  a specified  prior  probability  measure.  Then,  if  we  take  L - L^ , that  is 
the  space  of  T-integrable  functions,  then  the  assertion  that  sQ  is  (v,L) optimal  is  equivalent 
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to  the  assertion  that  sQ  is  Bayes  against  t.  One  can  provide  the  following  intuitive 
justification  for  the  notion  of  (v ,L)  optimality.  Since  the  partial  ordering  on  L 


has  been  identified  with  admissibility , one  can  say  that  s^  is  “at  least  as  good" 

as  s2  whenever  s^  <_  s 2 and  s^  "is  better  than  s2"  whenever  s^  < Sj.  The  set 

represents  the  totality  of  decision  procedures  to  be  considered  by  the  statistician. 

Thus  v < r says  that  v is  "at  least  as  good"  as  any  alternative  available  to  the 

statistician.  Hence  v should  be  regarded  as  an  ideal  decision  procedure.  Naturally 

v need  not  be  in  S^;  v should  be  regarded  as  what  you  would  do  if  you  were  able 

.to,  such  as  if  you  had  "perfect  information".  Hence,  it  is  natural  to  seek  to  come 

as  Npse  to  v as  possible  and  thus  the  idea  of  minimizing  the  norm  from  v to  Sr. 

L 

The  distance  from  v to  may  be  regarded  as  a measurement  of  the  "uncertainty". 

In  approximation  theory  a point  x and  a set  S in  a normed  linear  space  L are 
specified.  sQ  t S 'is  said  to  be  a best  approximation  to  x from  the  set  S if 
II s0  “ xll  £.  IIs  ” xll  fot  stS.  In  many  applications  S is  a linear  subspace 

of  L or  a convex  set  in  L.  Extensive  discussions  of  the  theory  of  best  approxima- 
tions may  be  found  in  R.  B.  Holmes  [7]  or  I.  Singer  [10].  However,  if  we  add  the 
additional  conditions  that  L has  a partial  ordering  and  a monotonic  norm  and  that 
v <_  S,  where  S is  a convex  set,  then  this  becomes  precisely  the  notion  of  (v,£) 
optimality  in  statistical  decision  theory.  Thus  (v,L)  optimality  may  be  interpreted 
as  best  one-sided  approximation  from  a distinguished  point  to  a convex  set  in  a normed 
linear  space  with  a monotonic  norm. 

We  now  show  how  methods  of  approximation  theory  may  be  used  to  obtain  characteriza- 
tions of  (v,£)  optimality. 

2.  Characterization  of  (v,£)  optimality.  In  this  section  we  present  some  theorems 
which  characterize  (v,£)  optimal  decision  procedures. 

Theorem  1.  Let  L * (0)  be  a partially  ordered  normed  linear  space  of  real  valued 
functions  of  8,  8 ( 6, equipped  with  a monotonic  norm.  Let  S be  a given  convex  set 


-3- 


f 


in  L and  let  v e L with  v S.  Then  sQ  < S is  (v,£)  optimal  if  and  only  if 


there  is  a linear  functional  L t L* , the  topological  dual  space  of  L,  such  that 
(1)  || L ||  » 1,  (2)  L e K*,  the  dual  of  the  positive  cone  K in  L,  (3)  L(sQ)  <_  L(s)  , 


t S.  (4)  L(sQ  - v)  - || sQ  - V ||  . 


Proof . Sufficiency. 


|S0  - V| 


L(sQ  - v)  < L(s  - V)  < II L II  II s - v| 


for 


every  s < S and  hence  sQ  is  <y,L)  optimal. 


Necessity.  K is  a non-empty  cone  with  0 as  vertex,  we  first  show  that  K * L. 
If  K = {0},  then  K » ic  and  ic  * L since  L * {0}.  If  K { 0 } , then  there  exists 


an  x >_  0,  x # 0.  Now  if  K « L,  we  must  have  -x  e K and  there  is  a sequence  {y^} 


with  y.  >.  0 for  all  k and  lim  y.  - -x.  Then  0 < x x + y,  and 


k-*“ 


0 < x < + y. 


k m 1,2,...  . Since  lim  ||x  + y ||  = 0,  we  have 
k-«>  k 


0 and  x 


We  now  divide  the  balance  of  the  proof  into  two  parts,  treating  separately  the 
cases  v e S and  v f S. 


If  v t S,  then  ||v  - sQ||  = 0 and  v • sQ.  Then  since  K * L,  there  is  a linear 


functional  L « L* , L * 0 such  that  L(x)  0 for  all  x e K.  Without  loss  of 
generality,  we  can  set  |Jl)|  = 1.  Then  for  every  s « S,  s-sQ-s-v^0  and 


s - v « K C K,  which  implies  L(s  - v)  L(sQ  “ * 0;  hence  L(s)  ^ L(sQ)  • 


Now  suppose  v / S.  Then  ||sQ  - v||  = d > 0.  Let  B(v,d)  = {x  e L : ||x  - v||  < d) 


and  let  S - {s  + x : s « S,  x * x) . S is  obviously  convex  since  S and  K are 
convex.  Then  for  z « S,  there  exists  s e s and  x >_  0 such  that 

0^s-v^s-v+x»z-v.  Hence  ||s  - v||  <J|z  - v||  and  z f B(v,d)  . Consequently, 

S n B(v,d)  -♦  and  there  exists  a separating  hyperplane  for  S and  B(v,d).  That 

* 

is,  there  exists  an  L * L and  a real  constant  c such  that  L * 0 and 

L(x)  £ c,  x « B(v,d) 

L (z)  > c,  z « S . 


( 


Further,  we  can  set  ||l||  ■ 1.  Clearly  sQ  e B(v,d);  also  sQ  t s and  S C S.  Therefore 


L(s)  c for  all  s t s and  L(sQ)  « c,  establishing  L(s)  <_  l(sq).  For  x r K, 

sQ  + x * S and  hence  L(sQ  + x)  L(bq),  which  implies  L(x)  > 0. 


J 


Finally,  for  any  z with  ||z||  <_  1,  ||v  + dz  - v||  « d||z||  < d so  that 
v + dz  t B(v,d).  Consequently,  L(v  + dz)  <_  L(sQ)  " c.  Similarly  L(dz)  <_  L(sQ  - v) 
and  thus  L(z)  < d 1L<Sq  - v)  . Hence  ||l||  £ d ^LCSg  - v)  and  d <_  L(sQ  - v) 
i.llLll  llso“  vll  “ d<  establishing  the  theorem. 

A special  case  of  Theorem  one  is  a well-known  result  in  statistics  and  is  given 
by  the  following  corollary. 

Corollary  1.  Let  8 be  a compact  Hausdorff  space  and  let  S - {R  (8) , if  t ♦)  be  a 
convex  set  of  continuous  functions  of  8,  uniformly  bounded  from  below.  Then  sQ  t S 
is  minimax  if  and  only  if  there  exists  a least  favorable  distribution  x0  and  sQ 
is  Bayes  against  tq. 

Proof.  Let  L - C ^ , the  space  of  continuous  real  valued  functions  on  8 with 

|| f ||  “ sup  |f(8)|  for  f € L.  Then  L is  isometric  to  the  set  of  regular  countably 
0*8 

additive  set  functions  on  the  sets  of  the  Bor el  o-algebra  of  8 (see  N.  L.  Dunford 
and  J.  T.  Schwartz  [3],  p.  265.).  For  L « L , ||l||  is  the  total  variation  of  the 
set  function  L.  Specializing  Theorem  1 to  this  case,  the  conditions 


L > 0 and 


| L ||  « 1 establish  that  L is  representable  by  a probability 


measure  t0  on  8.  L(sQ)  L(s)  implies  that  sQ  is  Bayes  against  tq, 

L(8q  - v)  - || sQ  - v ||  insures  that  tq  is  the  least  favorable  distribution. 

Remarks.  Many  other  well-known  results  for  minimax  decision  procedures  are  obtainable 

as  imediate  consequences  of  Theorem  1.  It  is  also  well-known  that  when  8 is  not 

compact,  a minimax  decision  procedure  may  exist  and  there  may  be  no  corresponding 

least  favorable  distribution.  This  is  a consequence  of  the  nature  of  the  topological 

dual  space  L in  this  case.  Namely,  the  topological  dual  is  a set  of  finitely 

additive  set  functions  rather  than  countably  additive  set  functions. 

The  next  theorem  provides  a characterization  of  (v,£)  optimality  in  terms  of  the 

• * * 

extremal  points  of  the  intersection  of  the  unit  ball  in  L with  K , where  K 
is  the  dual  of  K. 


As  before,  S is  a convex  set  in  L,  a partially  ordered  linear  space  * {0} 
of  real  valued  functions  of  9 provided  with  a monotonic  norm.  Let  v e L satisfy 


v < S and  let  d - inf  {||s  — v || } . The  closed  unit  ball  in  L will  be  denoted 

• » t3*®  » * 

by  S and  S+  = S n K . That  is,  S+  is  the  set  of  positive  linear  functionals 

with  norm  not  exceeding  unity. 

Then,  we  have  the  following  theorem. 

Theorem  2.  sQ  f S is  (v,L)  optimal  if  and  only  if  for  every  s e S there  exists 
a linear  functional  Lg  f £ with  the  following  properties: 

(1)  1 1 L»s 1 1 = 1,  (2)  Lg  is  an  extremal  point  of  S+,  (3)  Lg(s)  > Lg(sQ), 

(4)  Ls(S0  " v)  * II s0  ' VH  ■ 

Proof.  Sufficiency.  For  every  s e S,  1 1 sQ  - v||  - Lg(sQ  - v>  £ Lg(s  “ v> 

<llt.ll  II s - V II  - II S - v||  . 

• • 

Necessity . Assume  sQ  » v.  Then  since  S is  weak  compact  (by  Alaoglu's 

* • * * * 

theorem)  and  K is  weak  closed,  it  follows  that  S+  is  weak  compact.  Also  S+ 

* * 

is  convex.  By  the  Krein-Milman  theorem  S+  is  the  weak  closure  of  the  convex  hull 

* * 

of  its  extremal  points.  Let  E(S+)  denote  these  extremal  points.  By  Theorem  1,  S+ 

contains  an  L with  ||  L ||  ■ 1 and  therefore  E(S+)  contains  an  L with  ||l||  = 1. 

Also  L(s  - sQ)  » L(s  - v)  >_  0 for  all  a e s,  since  v ^ S.  Thus  for  every  s e S, 

it  suffices  to  set  L_  - L. 

s 

Now  assume  sQ  * v.  Let  A - {L  e S*  : L(sQ  - v)  « || sQ  - v ||  } . By  Theorem  1, 

* * 

A * ♦.  Further,  A is  a weak  compact  convex  extremal  subset  of  S+.  For  fixed 
s * S,  let  B - (L  « A : L(s  - s > « sup  L(s  - s_) } . Obviously,  B is  a non-empty 

S Q « U “ 

LEA 

convex  weak  compact  extremal  subset  of  A.  Thus  there  exists  an  extremal  point  Lg 

of  Bg  which  necessarily  is  an  extremal  point  of  A and  consequently  an  extremal 

point  of  S . Then,  by  Theorem  1,  there  is  an  L in  A such  that  MO  “ inf  L(s) 

+ seS 


and  therefore  0 <_  L(s  - s ) <_  sup  L(s  - s ) » L (s  - s ),  establishing  the  theorem. 


The  third  characterization  theorem  uses  the  notion  of  a Kolmogorov  fundamental 

system,  which  we  denote  by  K-systam. 

• • 

A subset  T of  S is  said  to  be  a K-system  if  T is  weak  closed  and  if  for 

all  x t K,  the  set  - {l  « T : L(x)  ■ || x || } ♦ . This  leads  to  the  following 

theorem . 

Theorem  3.  sQ  * s is  (v,L)  optimal  if  and  only  if,  for  any  K-system  T,  for  every 

s e s there  is  an  Lg  < Tg  _v  with  Lg(s)  > Lg(s  ). 

S so 

Proof.  Sufficiency.  For  s e S and  Lg  t Tg  with  Lg(s)  > ls(s  ),  we  have 

S0  v 


L (s 


v)  < L(s  - v)  < Ms  - v II . 


0 11  “'“0 

Necessity.  Now  assume  sQ  is  (v,£)  optimal  and  s e S.  Since  T 


s0-v 


is  weak 


compact,  there  is  an  Ls  e Tg  _v  such  that  Lg(s  - sQ)  = sup  L(s  - sQ) . The 

° UT 

s0-v 


theorem  will  be  established  if  we  can  show  that 


sup  L(s  - s ) « y > 0.  Therefore 


Le  T 


s0-v 


assume  y < 0.  Let  H = {L  e T : L(s  - s ) > 0}.  For  L « T , 

S S 0 — 8“  V 


Ms„ 


v)  < L 


s0  ~ VH  1 II s0  * vll  i IIs  ~ vll  “ Ms  " v)  . Thus  for  such  L,  L(s)  >_  L(sQ) 


and  Ts_v  c Hg<  insuring  that  Hfi  *■  $ . Further,  Hg  is  weak  compact  and  hence 


s 


there  exists  an  L’  « H such  that  L' (s.  - v)  • sup  L(s„  - v)  - a . Now  L(s  - s_)  < 0 
s s s 0 Os  0 

MHs 


for  all  L * T ; consequently  T 

s0-v  * s 


n H » $ . Therefore  for  L « , 

-vs  s 


L(sr 


v)  < ||  sQ  - v| 


and  a < s„  - v 
s " 0 1 


Also  a * sQ,  since  otherwise  we  would 


have  L(s  - s.)  « 0 for  all  L t L , but  for  L * T , L(s  - s„)  < 0.  Therefore 
0 Sq-V  0 

there  exists  a positive  number  t with  t < min{l,  ( || sQ  — v||  - ag)/||s-  sQ ||  } - Since 


(1  - t)sQ  + ts  - v e K and  (1  - t)s  + ts  « S,  it  follows  that  there  is  an  « T 

with  L1  ( (1  - t)sQ  + ts  - v)  - II  (1  - t)sQ  + ta  - v||  >_  || sQ  - v||  . f Hg,  since 

Ll(s0  ' V)  + tLl(S  ' V - as  + tLl<8  ' V - °s  + t"8“  *o"  - “s  + H 80  * VH  " “s 


1*0  * V> 


a contradiction.  Further,  is  not  in  T n Hg;  since 


L1^S0  - v'  + - ■„>  * *8o  “ v>  l.ll*o  “ vll*  Thus  assuming  yg  < 0 leads  to  a 

contradiction. 
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Remarks . The  methods  used  in  the  proofs  of  Theorems  1 and  2 have  been  substantially 
influenced  by  the  work  of  F.  R.  Deutsch  and  P.  H.  Maserick  [2].  The  results  contained 
therein  jure  similar  to  standard  theorems  of  approximation  theory,  but  with  one 
significant  difference.  As  a consequence  of  the  assumption  v £ S,  the  characterizing 
linear  functionals  are  positive  linear  functionals.  Theorem  3 is  an  adaptation  of 
results  of  V.  N.  NikolSskil  [9,  10]  and  was  reported  by  G.  Heindl  in  [6] . 

We  conclude  this  section  with  a result,  which  is  elementary,  but  nevertheless 
appears  to  be  new.  This  gives  a simple  relationship  between  (v,£)  optimality  and 
admissibility. 

Theorem  4.  If  sQ  t S is  the  unique  (v,£)  optimal  decision  procedure,  then  sQ  is 
necessarily  admissible. 

Proof.  Assume  sQ  t S is  (v,£)  optimal  and  inadmissible.  Then  there  exists  an 

sx  * S with  < sQ.  Consequently,  °.ls1~v<s0_v  and  ||s^  - v||  <_  ||sQ  - v||  . 

Hence,  s^  is  also  (v,£)  optimal.  Thus  (v,£)  optimality  and  inadmissibility 
implies  non-uniqueness. 

3 , Concluding  Remarks . The  above  paper  constitutes  an  attempt  to  utilize  some  recent 

mathematical  developments  in  formulating  notions  of  statistical  optimality.  It  is 

hoped  that  this  will  provide  insight  into  statistical  theory  and  the  essential 
differences  between  vjurious  statistical  philosophies. 
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